A Comparative Study on Arabic Text Classification

نویسنده

Alaa El-Halees

چکیده

This paper focuses on Automatic Arabic classifications. Arabic language is highly inflectional and derivational language which makes text mining a complex task. In classifying Arabic text, there are many published experimental results. Since these results came from different datasets, authors and evaluation metrics, we cannot compare the performance of the experimented classifiers. In this paper, we compared six well known classifiers, which are: Maximum entropy, Naïve Bayes, Decision Tree, Artificial Neural Networks, Support Vector Machine ,and k-Nearest Neighbor using the same data sets and the same experimental settings. The recall , precision and fmeasure for the classifiers are computed and compared. Then, the comparison has been done after applying feature selection on Arabic datasest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

Topicalization in English Translation of the Holy Quran: A Comparative Study

The Holy Quran, as an Arabic masterpiece, comprises great domains of syntactical, phonological, and semantic literary patterns. These patterns work as the shackle of translators. This study examined the application of the most common shift strategies in Catford‟s linguistic model for translation of topicalization in chapter 29 of the Holy Quran. The topicalized cases were compared to their coun...

متن کامل

A comparative study of the text inside the Mihrabi rug by Zareh Penyamin and Topkapi Palace Museum according to the existing discourse in the 16th and 19th

IIn the country of Turkey, in the city of Hereke, at the end of the 19th century, rugs known as Mihrabi became popular, which were inspired by the rugs of the Safavid era and kept in the Topkapi Palace Museum. In these rugs, which are reproduced in royal workshops on a large scale, some changes have been made in the verbal text and incorporated visual elements. Among the rugs that seem to have ...

متن کامل

New stemming for arabic text classification using feature selection and decision trees

In this paper we conduct a comparative study between two stemming algorithms: khoja stemmer and our new stemmer for Arabic text classification (categorization), using Chisquare statistics as feature selection and focusing on decision tree classifier. Evaluation used a corpus that consists of 5070 documents independently classified into six categories: sport, entertainment, business, middle east...

متن کامل

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

This paper presents the results of classifying Arabic text documents using the N-gram frequency statistics technique employing a dissimilarity measure called the “Manhattan distance”, and Dice’s measure of similarity. The Dice measure was used for comparison purposes. Results show that N-gram text classification using the Dice measure outperforms classification using the Manhattan measure.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Egyptian Computer Science Journal

دوره 30 شماره

صفحات -

تاریخ انتشار 2008

A Comparative Study on Arabic Text Classification

نویسنده

چکیده

منابع مشابه

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Document Analysis And Classification Based On Passing Window

Topicalization in English Translation of the Holy Quran: A Comparative Study

A comparative study of the text inside the Mihrabi rug by Zareh Penyamin and Topkapi Palace Museum according to the existing discourse in the 16th and 19th

New stemming for arabic text classification using feature selection and decision trees

Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study

عنوان ژورنال:

اشتراک گذاری